Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

نویسندگان

  • Jens Dittrich
  • Jorge-Arnulfo Quiané-Ruiz
  • Alekh Jindal
  • Yagiz Kargin
  • Vinay Setty
  • Jörg Schad
چکیده

MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop — an open-source implementation of MapReduce — often does not match the one of a well-configured parallel DBMS. In this paper we propose a new type of system named Hadoop++: it boosts task performance without changing the Hadoop framework at all (Hadoop does not even ‘notice it’). To reach this goal, rather than changing a working system (Hadoop), we inject our technology at the right places through UDFs only and affect Hadoop from inside. This has three important consequences: First, Hadoop++ significantly outperforms Hadoop. Second, any future changes of Hadoop may directly be used with Hadoop++ without rewriting any glue code. Third, Hadoop++ does not need to change the Hadoop interface. Our experiments show the superiority of Hadoop++ over both Hadoop and HadoopDB for tasks related to indexing and join processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Only Aggressive Elephants are Fast Elephants

Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider’s orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We ...

متن کامل

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Large-scale data analysis has become increasingly important for many enterprises. Recently, a new distributed computing paradigm, called MapReduce, and its open source implementation Hadoop, has been widely adopted due to its impressive scalability and flexibility to handle structured as well as unstructured data. In this paper, we describe our data warehouse system, called Cheetah, built on to...

متن کامل

Antibacterial Activity of Elephant Garlic and Its Effect against U2OS Human Osteosarcoma Cells

  Objective(s): The present study was designed to investigate the antibacterial function and pharmacological effect of elephant garlic (Allium ampeloprasum var. ampeloprasum) on U2OS human osteosarcoma cells.   Materials and Methods: Seven kinds of bacteria were reconstituted, inoculated and tested in this research to evaluate elephant garlic antibacterial activity. By the means ...

متن کامل

Semiology the Animal Motifs on Medallion Fabrics during the Sasanian Era (Case Study: Ram, Boar, Lion, Deer and Elephant)

In a wide range of fabrics attributed to Sasanian era, animal motifs are observable in the form of geometric designs, including medallions. The case study in the present paper concerned animal motifs of Sassanid textiles with a focus on natural and non-domesticated animals that were able to be depicted in the form of a Medallion design alone and without the presence of a hunter. According to th...

متن کامل

Identifying Words to Explain to a Reader: A Preliminary Study

In previous work we tried to automatically annotate text with semantic assistance on words. We augmented text with short "factoids" about words such as "cheetah can be a kind of cat. Is it here?" We used WordNet (Fellbaum 1998) to retrieve synonyms (assistance is like aid), hypernyms (cheetah is a kind of cat), and antonyms (beautiful is the opposite of ugly), and focused on the "low-hanging fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2010